SVM

Before moving forward with the to-do list, let’s throw a Random Forest to it.

SVM

For many reasons, Random Forest is usually a very good baseline model. In this particular case I started with the polynomial OLS as baseline model, just because it was so evident from the correlations that the relationship between temperature and consumption follows a polynomial shape. But let’s go back to a beloved RF.

/home/runner/work/strom/strom/.venv/lib/python3.12/site-packages/sklearn/svm/_classes.py:31: FutureWarning:

The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.

/home/runner/work/strom/strom/.venv/lib/python3.12/site-packages/sklearn/svm/_base.py:1237: ConvergenceWarning:

Liblinear failed to converge, increase the number of iterations.

Model Cards provide a framework for transparent, responsible reporting. 
 Use the vetiver `.qmd` Quarto template as a place to start, 
 with vetiver.model_card()
Writing pin:
Name: 'wd-svm'
Version: 20251108T185012Z-63aa9
♻️  stepit 'svm_raw': is up-to-date. Using cached result for `strom.modelling.assess_model()` 2025-11-08 18:50:12

Metrics

Single Split CV
train test test train
MAE - Mean Absolute Error 2.759698 2.495169 3.087172 3.141476
MSE - Mean Squared Error 17.777598 16.023303 15.989639 21.179925
RMSE - Root Mean Squared Error 4.216349 4.002912 3.781907 4.580536
R2 - Coefficient of Determination 0.815390 0.805501 -7.118546 0.784520
MAPE - Mean Absolute Percentage Error 0.311556 0.364589 0.660358 0.254285
EVS - Explained Variance Score 0.818426 0.806203 -1.855572 0.824080
MeAE - Median Absolute Error 2.164015 1.937039 2.542521 2.327501
D2 - D2 Absolute Error Score 0.612945 0.620560 -1.968504 0.553734
Pinball - Mean Pinball Loss 1.379849 1.247584 1.543586 1.570738

Scatter plot matrix

Observed vs. Predicted and Residuals vs. Predicted

Check for …

check the residuals to assess the goodness of fit.

  • white noise or is there a pattern?
  • heteroscedasticity?
  • non-linearity?

Normality of Residuals:

Check for …

  • Are residuals normally distributed?

Leverage

Scale-Location plot

Residuals Autocorrelation Plot

Residuals vs Time

Well, not that bad, but it is overfitting quite a lot.

♻️  stepit 'grid_search_pipe': is up-to-date. Using cached result for `strom.modelling.grid_search_pipe()` 2025-11-08 18:50:16

Model Cards provide a framework for transparent, responsible reporting. 

 Use the vetiver `.qmd` Quarto template as a place to start, 

 with vetiver.model_card()

Writing pin:

Name: 'wd-svm'

Version: 20251108T185016Z-dd803
♻️  stepit 'svm_tuned': is up-to-date. Using cached result for `strom.modelling.assess_model()` 2025-11-08 18:50:16

Metrics

Single Split CV
train test test train
MAE - Mean Absolute Error 2.341431 1.950447 2.147532 2.581758
MSE - Mean Squared Error 15.653589 13.867218 7.877316 17.898406
RMSE - Root Mean Squared Error 3.956462 3.723871 2.711544 4.228775
R2 - Coefficient of Determination 0.837447 0.831672 -1.935593 0.817129
MAPE - Mean Absolute Percentage Error 0.184828 0.180864 0.474445 0.178073
EVS - Explained Variance Score 0.839705 0.847164 -1.089626 0.820615
MeAE - Median Absolute Error 1.497370 1.089080 1.790379 1.581437
D2 - D2 Absolute Error Score 0.671608 0.703396 -0.894147 0.631738
Pinball - Mean Pinball Loss 1.170716 0.975223 1.073766 1.290879

Scatter plot matrix

Observed vs. Predicted and Residuals vs. Predicted

Check for …

check the residuals to assess the goodness of fit.

  • white noise or is there a pattern?
  • heteroscedasticity?
  • non-linearity?

Normality of Residuals:

Check for …

  • Are residuals normally distributed?

Leverage

Scale-Location plot

Residuals Autocorrelation Plot

Residuals vs Time

TODOs